WTF-LOD - A New Resource for Large-Scale NER Evaluation

نویسندگان

  • Lubomir Otrusina
  • Pavel Smrz
چکیده

This paper introduces the Web TextFull linkage to Linked Open Data (WTF-LOD) dataset intended for large-scale evaluation of named entity recognition (NER) systems. First, we present the process of collecting data from the largest publically-available textual corpora, including Wikipedia dumps, monthly runs of the CommonCrawl, and ClueWeb09/12. We discuss similarities and differences of related initiatives such as WikiLinks and WikiReverse. Our work primarily focuses on links from “textfull” documents (links surrounded by a text that provides a useful context for entity linking), de-duplication of the data and advanced cleaning procedures. Presented statistics demonstrate that the collected data forms one of the largest available resource of its kind. They also prove suitability of the result for complex NER evaluation campaigns, including an analysis of the most ambiguous name mentions appearing in the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Based View: A Promising New Theory for Healthcare Organizations; Comment on “Resource Based View of the Firm as a Theoretical Lens on the Organisational Consequences of Quality Improvement”

This commentary reviews a recent piece by Burton and Rycroft-Malone on the use of Resource Based View (RBV) in healthcare organizations. It first outlines the core content of their piece. It then discusses their attempts to extend RBV to the analysis of large scale quality improvement efforts in healthcare. Some critique is elaborated. The broader question of why RBV seems to be migrating into ...

متن کامل

Frank : Frank: The LOD Cloud at Your Fingertips

Large-scale, algorithmic access to LOD Cloud data has been hampered by the absence of queryable endpoints for many datasets, a plethora of serialization formats, and an abundance of idiosyncrasies such as syntax errors. As of late, very large-scale — hundreds of thousands of document, tens of billions of triples — access to RDF data has become possible thanks to the LOD Laundromat Web Service. ...

متن کامل

Concurrent control on resource planning and revenue/expenditure estimation in large-scale shell material embankment projects management using discrete-event simulation

Resource planning in large-scale construction projects has been a complicated management issue requiring mechanisms to facilitate decision making for managers. In the present study, a computer-aided simulation model is developed based on concurrent control of resources and revenue/expenditure. The proposed method responds to the demand of resource management and scheduling in shell material emb...

متن کامل

The Design and Implementation of the Wave Transactional Filesystem

This paper introduces the Wave Transactional Filesystem (WTF), a novel, transactional, POSIX-compatible filesystem based on a new file slicing API that enables efficient file transformations. WTF provides transactional access to a distributed filesystem, eliminating the possibility of inconsistencies across multiple files. Further, the file slicing API enables applications to construct files fr...

متن کامل

Frank: Algorithmic Access to the LOD Cloud

Large-scale, algorithmic access to LOD Cloud data has been hampered by the absence of queryable endpoints for many datasets, a plethora of serialization formats, and an abundance of idiosyncrasies such as syntax errors. As of late, very large-scale – hundreds of thousands of document, tens of billions of triples – access to RDF data has become possible thanks to the LOD Laundromat Web Service. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016